Terms
updated_time: a field for each file, when a user edit a file, its updated_time is modified. When push to remote, it preserved the same value, common terms: local.updated_time, remote.updated_time
sync_time (a.k.a lastSync, context.timestamp): a field for each file, represent it most recent sync time.
=> local_changed_since_last_sync = local.updated_time > sync_time
basicDelta algorithm
in src/FileApi/FileApi.ts, FileApi.basicDelta method
Use by local filesystem API
-
Overview: this algorithm finds out which files have been changed since last (sync) time. A timestamp is given into the algorithm, general condition for files:
- updated_time < timestamp (files older than timestamp): are skipped, not need to sync
- updated_time > timestamp (files newer than timestamp): are tracked, need to sync
- updated_time == timestamp (see edge case below)
-
context: this includes sync context, because this function is run multiple times, each call need to keep track of what is done in its previous call. These information are stored in context, and passed as a parameter.
- Get stats of each files, the stat should have updated_time for the file. This operation should not perform file read, so we can perform on mass amount of files without performance issues.
=> Sort them, oldest items come first (later use for binary search) - The pseudo-code is like this: for each stat above:
if updated_time < timestamp (file older):
skip
if updated_time == timestamp:
// if previous sync has already processed
if context.filesAtTimestamp.has(stat):
skip
// additional tracking
// tracking highest (newest) timestamp
// and tracking filesAtTimestamp for next sync
if updated_timestamp > newest_timestamp:
newest_timestamp = updated_timestamp
// reset after a timestamp jump
newContext.filesAtTimestamp = []
// after passing above requirements
output.push(stat)
// for next sync
newContext.filesAtTimestamp.push(stat)
Edge cases
- Files at timestamp
// Note that we keep both the timestamp of the most recent change, *and* the items that exactly match this timestamp. This to handle cases where an item is modified while this delta function is running.
// For example:
// t0: Item 1 is changed
// t0: Sync items - run delta function
// t0: While delta() is running, modify Item 2
// Since item 2 was modified within the same millisecond, it would be skipped in the next sync if we relied exclusively on a timestamp.
=> context.filesAtTimestamp contains the files (path) at (newest) timestamp of that sync.
-
Tracking equal timestamp avoid the skip issue above, but may cause re-syncing the same file multiple times => we need to check and skip them properly
-
It seems at first that files that have updated_time >= timestamp are pushed into filesAtTimestamp, but it's guarantee == condition in the end. Because whenever there's a file that has updated_time > timestamp, filesAtTimestamp is cleared, only the ones that are equal stay at the end.
UPDATE
Filesystem
- Based on basic delta above
- Input: local_item (needed update, required id + sync_time)
- How it works:
-
Fetch the remote_item, i'll refer it as remote remote not found => items not sync, use to CREATE operation instead
-
Check remote_item and lastSync conditions:
remote.updated_time > local.sync_time: remote has changed since last sync => conflict => pull and resolve firstremote.updated_time == local.sync_time: remote hasn't changed => no conflict => allow
remote.updated_time < local.sync_time: this shouldn't happen or the file is corrupted, backup your data (local and remote) and re-create it again.
READ with timestamp (PULL operation)
Input: context.timestamp
-> This timestamp should equal the maximum sync_time (newest) of all local files, if previous sync is done successfully
Explained in the basicDelta. This operation get all remote files that are newer than timestamp (remote changes since last sync).
remote.updated_time > timestamp
in order to apply this to local file, please further check
remote.updated_time > local.updated_time (ensure that remote changes is newer)
WHY?
tracking a timestamp, and let all remote (mass stat check without costly performance) compare against a timestamp is much more efficient than comparing each remote to a local updated time -> Efficient checking if remote has changed.
If local made a change but do not sync, there's a case where server has changed, and local has changed too, which cause a conflict. Hence we need to check if remote.updated_time > local.updated_time
Conclusion: Allow to apply remote changes to local = server has changed and local hasn't changed since last sync = remote.updated_time > last_sync_time (remote has changed) and remote.updated_time > local.updated_time (local hasn't changed)
local.updated_time > local.sync_time (local changes) + remote.updated_time > local.sync_time (remote changes since last file sync) = conflict
local.updated_time > local.sync_time (local changes) + remote.updated_time == local.sync_time (server not changes) = upload local to remote
(assuming all local changes upload here)
remote > timestamp (delta) (all remote changes since last sync) + remote.updated_time > local.updated_time (remote newer) = apply remote to local
remote.updated_time > local.updated_time oversee the local changes case, it show nothing besides knowing that they're not equal
remote > timestamp (delta) (all remote changes since last sync) + remote.updated_time == local.updated_time (remote newer) = no fetch
remote > timestamp (delta) (all remote changes since last sync) + remote.updated_time < local.updated_time (remote newer) = local newer but it has been resolved in the upload step, this shouldn't happen